Leading Professional Society for Computational Biology and Bioinformatics
Connecting, Training, Empowering, Worldwide

banner

Posters

Poster Categories
Poster Schedule
Preparing your Poster - Information and Poster Size
How to mount your poster
Print your poster in Basel

View Posters By Category

Session A: (July 22 and July 23)
Session B: (July 24 and July 25)

Presentation Schedule for July 22, 6:00 pm – 8:00 pm

Presentation Schedule for July 23, 6:00 pm – 8:00 pm

Presentation Schedule for July 24, 6:00 pm – 8:00 pm

Session A Poster Set-up and Dismantle
Session A Posters set up: Monday, July 22 between 7:30 am - 10:00 am
Session A Posters should be removed at 8:00 pm, Tuesday, July 23.

Session B Poster Set-up and Dismantle
Session B Posters set up: Wednesday, July 24 between 7:30 am - 10:00 am
Session B Posters should be removed at 2:00 pm, Thursday, July 25.

Q-01: An automatic machine learning tool for prediction of immune-evasion mechanisms in cancer samples
COSI: CAMDA COSI
  • Andreia Rogerio, Universidade de Lisboa, Portugal
  • Claudia Antunes, Universidade de Lisboa, Portugal

Short Abstract: The incidence of cancer in society has become a growing concern and has encouraged the development of new treatments, like immunotherapy, which boosts the body's natural defenses to fight cancer. Unfortunately, cancer cells have developed strategies to evade patients’ immune system, called immune-evasion mechanisms. To tackle this problem, I applied machine learning techniques to The Cancer Genome Atlas data, which includes 10.000 tumour samples with genomic and molecular features. Two immune-evasion mechanisms were predicted: low load of neoantigens and high load of regulatory T-cells. Several classifiers fitted the data including Naive-Bayes, Logistic Regression and Random Forest, achieving cross-validation accuracy scores of 0.70, 0.72 and 0.76 for the load of neoantigens, and 0.65, 0.70 and 0.74 for the regulatory T-cells class, respectively. A feature importance study was conducted revealing that patients with lower levels of neoantigens have high levels of plasma cells and high homologous recombination deficiency, while patients with high levels of regulatory T-cells have high levels of T-cell receptors of type A and high levels of plasma cells, among other features. These conclusions promote personalized medicine and the application of immunotherapy when suitable. A web tool is under development to allow user interaction, applying feature selection or engineering techniques.

Q-02: Evaluation of Connectivity Map shows limited reproducibility in drug repositioning
COSI: CAMDA COSI
  • Nathaniel Lim, The University of British Columbia, Canada
  • Paul Pavlidis, The University of British Columbia, Canada

Short Abstract: The Connectivity Map (CMap) is a widely used resource enabling data-driven drug repositioning using a large compendium of gene expression profiles. However, evaluations of its performance are limited. We took advantage of the availability of two iterations of CMap (CMap 1 and CMap 2) to assess their comparability and reliability. First, we queried CMap 2 with 28 drug signatures derived from CMap 1, hypothesizing that CMap 2 would highly prioritize the same drugs. We found that CMap 2 succeeded only 2/28 times. In a similar analysis, CMap 2 was unable to replicate previously published drug prioritization recommendations that were originally obtained using CMap 1 from three studies. Next, we compared the similarity of individual differential expression profiles for the same conditions between both CMap versions (109 profiles), and also compared a third dataset (De Abrew et al. (2016), 12 compounds). We found that the profiles were highly dissimilar among all data sets (mean correlation < 0.1), likely explaining the limited reproducibility of prioritization. Because of the general lack of consistency, it is unclear which CMap version is more reliable. Our findings have implications for the use of CMap and suggest steps investigators can take to limit false positives.

Q-03: A Novel Gene Selection Method for Gene Expression Data for the Task of Cancer Type Classification
COSI: CAMDA COSI
  • Arzucan Ozgur, Bogazici University, Turkey
  • Nuriye Özlem Özcan Şimşek, Boğaziçi Unversity, Turkey
  • Fikret Gurgen, Bogazici University, Turkey

Short Abstract: Abstract Genomic data can be utilized for diagnosis of many diseases such as cancer. Cancer disease is caused by the mutations in DNA. These mutations may take action or be suppressed. The result of the active or suppressed state of mutations can be identified by gene expressions. In this study, we utilize and transfer the information of the effect of mutations in the development of cancer disease for a novel gene selection method for gene expression data. We tested the proposed method in order to diagnose and differentiate cancer types. Our experiment results show that the proposed gene selection method leads to similar or improved performance metrics compared to classical feature selection methods and curated gene sets.

Q-04: Systematic evaluation of microbial abundance from amplicon and shotgun sequencing for machine learning prediction of sample origin
COSI: CAMDA COSI
  • Julie Chih-Yu Chen, Public Health Agency of Canada - National Microbiology Laboratory, Canada
  • Andrea Tyler, Public Health Agency of Canada - National Microbiology Laboratory, Canada

Short Abstract: Recent technological advances have provided different ways to measure microbial abundances in collected samples. The 16S ribosomal RNA amplicon sequencing approach targets and specifically sequences the 16S rRNA gene of bacteria and archaea, whereas the shotgun whole genome sequencing (WGS) approach sequences all the DNA present in a sample. The latter thus allows identification to the level of species, evaluation of functional units, and concurrent identification of eukaryotes, fungi and DNA viruses. With the abundant data from different environmental sources and protocols in the CAMDA challenge, we first set out to evaluate differences in organism abundance among datasets generated using different sequencing technologies. Subsequently, we performed a supervised machine learning approach for predicting sample origin using these data. Current prediction models focus on classification methods such as support vector machines, random forest or Bayesian approaches. However, the predictions of these classification models are limited to sources where the training samples were from. Hence, we propose to model the longitude and latitude as the outcome variables using multivariate regression to enable predictions of new origins and using Lasso regularization to enhance prediction accuracy and avoid model overfitting.

Q-05: Steps towards predictive models for DILI based on chemical structure and gene expression signatures and their interpretation
COSI: CAMDA COSI
  • Anika Liu, University of Cambridge, United Kingdom
  • Peter Wright, University of Cambridge, United Kingdom
  • Aleksandra Bartosik, University of Cambridge, United Kingdom
  • Daniela Dolciami, University of Cambridge, United Kingdom
  • Moritz Walter, University of Cambridge, United Kingdom
  • Andreas Bender, University of Cambridge, United Kingdom

Short Abstract: Drug-induced liver injury (DILI) is a major safety concern in drug development. One approach to detect DILI early are cellular readouts such as gene expression profiles. However, it is unclear how much signal these contain with respect to DILI compared to molecular structure. We identified 13 DILI-related proteins based on target prediction, such as Prostaglandine E synthase and c-Jun-N-terminal synthase 2, and 4 scaffolds from substructural mining including benzenesulfonamides. Random Forest models based on chemical structure afforded better performance than those using gene expression with balanced accuracies of 0.70 and 0.55 during cross-validation, respectively. However, this is only a preliminary result as the number of drugs varied between the gene expression- and the chemical structure-based models, and the models’ ability to extrapolate to novel chemical space has not yet been evaluated. A deeper analysis of the LINCS data has been difficult due to its noisiness and the limited coverage of compounds and genes. Improved results might be obtained by different data processing and filtering based on therapeutic doses. Further work will continue to evaluate the usefulness of gene expression and molecule structure with respect to understanding and predicting DILI and will also explore their complementarity.

Q-06: Integration of human cell lines gene expression and chemical properties of drugs for Drug Induced Liver Injury prediction
COSI: CAMDA COSI
  • Wojciech Lesinski, University of Bialystok, Poland
  • Agnieszka Kitlas Golinska, University of Bialystok, Poland, Poland
  • Krzysztof Mnich, University of Białystok, Poland
  • Witold R. Rudnicki, University of Białystok and ICM University of Warsaw, Poland

Short Abstract: Abstract—Motivation: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI, based on the chemical properties of substances and experiments performed on cell lines, can bring a significant reduction in the cost of clinical trials. The current study aims to build predictive models of drugs using both their chemical properties, as well as gene expression levels in cell-lines treated with them. Methods: We built cross - validated Random Forest predictive models using gene expression from 13 human cell lines and molecular properties of drugs. In this process we identified the most informative variables and built models on them. Models were built both for expression profiles of individual cell lines and chemical properties, as well as using different methods of integration of them. Results: We have obtained a weakly predictive model for models that used molecular descriptors alone, and models that used expression profiles from some cell lines – AUC 2 (0:55 - 0:61). The individual models were then integrated using Super Learner approach. The accuracy is significantly improved in the case of the composite model (AUC=0.74, MCC=0.34), which allows for a division of drug compounds into low-risk and high-risk classes.

Q-07: Constructing microbial fingerprint for unraveling city-specific signature and identifying sample origin locations
COSI: CAMDA COSI
  • Runzhi Zhang, University of Florida, United States
  • Alejandro Walker, University of Florida, United States
  • Susmita Datta, University of Florida, United States

Short Abstract: Composition of microbial communities can be location specific, and the different abundance of taxon within location could help us to construct the microbial fingerprint for predicting the sample origin locations accurately. In this study, the whole genome shotgun (WGS) metagenomics data from samples across several cities around the world were used for constructing the microbial fingerprint. Principal Component Analysis was used to assess the separation of the cities based on combined taxon. Appropriate machine learning methods including Random Forest, Support Vector Machine and Linear Discriminant Analysis were used to predict the origin of samples. Raw data was used at first, and due to the low coverage of the sequencing data from London samples, the final count of common species, families and orders in the final dataset, was enough justification for the removal of London’s samples. This resulted in the reduction of the error rate for each classifier. Analysis of composition of microbiomes (ANCOM-II) was conducted and showed a pattern of the difference of microbial composition between different cities. The results in this study gave us some inspiration about the importance of the number of taxon, which could be improved by more samples or better sequencing depth.

Q-08: An ensemble learning approach for modeling the systems biology of drug-induced injury in human liver
COSI: CAMDA COSI
  • Emre Guney, GRIB (IMIM-UPF), Spain
  • Joaquim Aguirre-Plans, GRIB (IMIM-UPF), Spain
  • Terezinha Souza, Maastricht University, Netherlands
  • Janet Piñero, GRIB (IMIM-UPF), Spain
  • Giulia Callegaro, Leiden University, Netherlands
  • Steven J. Kunnen, Leiden University, Netherlands
  • Laura I. Furlong, GRIB (IMIM-UPF), Spain
  • Baldo Oliva, GRIB (IMIM-UPF), Spain

Short Abstract: Drug-induced liver injury (DILI) has a relatively high incidence rate, estimated to affect around 20 in 100,000 inhabitants worldwide each year. Many drugs ranging from pain killers to anti-tuberculous treatments can cause DILI. Despite DILI being one of the leading causes of acute liver failure, the pathophysiology of DILI is poorly understood and pinpointing the toxicity of compounds in human liver remains non-trivial. Accordingly, several methods have been proposed to predict the hepatotoxicity of compounds. Among these, machine learning models trained using drug estructural features have shown a good performance. Furthermore, the incorporation of gene- and pathway-level signatures from transcriptomics data has shown a high predictive accuracy using Deep Neural Networks. In this work, to predict DILI, we investigated combining gene expression data from the Connectivity Map (CMap), target binding information and chemical similarity of drugs upon drug treatment into ensemble learning methods using random forest classifiers and gradient boosting machines.

Q-09: mi-faser based partition of the CAMDA 2019 mystery samples in the Metagenomic Forensics Challenge
COSI: CAMDA COSI
  • Maximilian Miller, Rutgers University, United States
  • Yana Bromberg, Rutgers University, United States
  • Yannick Mahlich, Technical University of Munich, Germany
  • Chengsheng Zhu, Rutgers University, United States

Short Abstract: Here we present an analysis of the CAMDA 2019 Metagenomic Forensics Challenge data. Our aim was to predict the geographic origins of so-called mystery location metagenome samples. For this, we partitioned the mystery samples into groups (i.e. cities) and compared their functional fingerprint against metagenome samples of known geographic origin. All samples are whole-genome shotgun sequenced microbiomes extracted from subway systems, an worldwide effort of the MetaSUB project. To this end, we used our mi-faser pipeline to functionally profile all 16 known cities provided in the challenge based on their metagenome samples. We created the same functional profiles for each of the mystery location samples. Those samples belong to cities which were not sampled before. Applying t-Distributed Stochastic Neighbor Embedding (t-SNE) and k-means clustering we propose a partition of the mystery samples of unknown location into ten sub-groups, i.e. cities. We describe ongoing efforts to further augment the mi-faser results to generate final location estimates for the set of mystery location metagenomes.

Q-10: A systematic analysis of multiple cancer studies within a novel enhanced framework for semantic data integration
COSI: CAMDA COSI
  • Iliyan Mihaylov, Sofia University St. Kliment Ohridski, Faculty of Mathematics and Informatics, Sofia, Bulgaria, Bulgaria
  • Maciej Kańduła, Chair of Bioinformatics Research Group, Boku University Vienna, Poland
  • Dimitar Vassilev, Sofia University St. Kliment Ohridski, Faculty of Mathematics and Informatics, Sofia, Bulgaria, Bulgaria

Short Abstract: Integrative approaches to cancer data analysis remain an active field of research, where effective integration of heterogeneous data sources, like, clinical, morphologic, molecular data, etc. is becoming crucial for subtyping and treating cancer. Moreover, measurements need to be combined not only across patients but also across assay types, i.e. both horizontally and vertically. This makes integration a complex problem. Importantly, management of knowledge, data accessibility and useability, lack of standards and common interfaces are also well recognized challenges in bioinformatics. We develop a computational framework for integration of heterogeneous data, where relations between structurally unrelated data sources are inferred both from the data themselves, as well as from additional external sources, seamlessly facilitating knowledge discovery. We develop an enhanced novel universal predictive parameter for survival time prediction in cancer patients, focusing here on the TCGA cancer data sets. Our framework applies multiple machine learning regression-based models and incorporates cross-validation methodologies for effective benchmarking.

Q-11: maTE Detection of MicroRNAs Potentially Responsible for Differential Gene Expression Correlates Well with Differential MicroRNA Expression
COSI: CAMDA COSI
  • Malik Yousef, Zefat College, Israel
  • Jens Allmer, Hochschule Ruhr West, University of Applied Sciences, Mülheim an der Ruhr, Germany, Germany

Short Abstract: We applied the novel maTE version to the breast cancer dataset provided in the CAMDA challenge and to other TCGA datasets. The differential expression change for many of the miRNAs correlates well with the differential expression change of their mRNA targets. For others there is no apparent correlation. This is expected as only a subset of the miRNAs cause post-transcriptional regulation via cleaving of their target mRNA. In the future, we hope to apply maTE to a coordinated dataset including miRNA-seq, RNA-seq and protein expression.

Q-12: Benchmarking scRNA-seq clustering methods using multi-parameter ensembles of simulated data and workflows
COSI: CAMDA COSI
  • Xianing Zheng, University of Michigan, United States
  • Jun Z. Li, University of Michigan - Ann Arbor, United States

Short Abstract: Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technology for surveying cell types and state transitions. Today, >350 tools have appeared to address >30 scRNA-seq tasks. However, the community still struggles to identify the best workflow for any given task, including clustering. Benchmarking studies to date have relied on real datasets, using published methods at their default settings. We expanded these efforts by creating an ensemble of truth-known simulated datasets and testing them across many parameter combinations of existing methods. We produced scRNA-seq counts matrices by systematically altering five parameters: cluster size, cluster distance, cell library size, cell library size variability, and dropout rate, creating a full combination of 1,024 datasets. We evaluated 15 clustering methods, each with three gene selection methods and five k values, for 225 workflows, and 1024 ✕ 225 = 230,400 runs. Performance variation over the five parameters revealed strengths/weaknesses of individual methods/workflows. SC3 and Seurat performed well for most of the datasets. RaceID2 was sensitive to the true cluster distance; SIMR and RaceID2 only perform well for moderate-to-high library sizes. The ability to benchmark algorithmic choices using simulations that cover a wide parameter space is essential in developing customized pipelines for each real study.

Q-13: A Machine Learning Framework to Determine Geolocations from Metagenomics Profiling
COSI: CAMDA COSI
  • Lihong Huang, Xiamen University, China
  • Canqiang Xu, Aginome Scientific, China
  • Wenxian Yang, Aginome Scientific, China
  • Rongshan Yu, Xiamen University, China

Short Abstract: Profiling of microbiomes can answer forensic questions including the geographical origin of an environmental sample. However, due to the rich and diverse interaction among microbiomes and environment, it is a usually challenging task to associate microbiome samples with their origins. To this end, we developed a machine learning framework to predict the geolocations of microbiome samples based on metagenomics profiling. Specifically, our method uses abundance profiles of a set of species as fingerprints, where the set of species are selected using machine learning algorithms based on their differentiation power to different cities from the dataset. In addition, the abundance profiles are further binned to binary values according to their percentile in the dataset to avoid potential overfitting problem due to small training set. Our results show that once the abundance profiles of the metagenomic data are extracted, data-driven machine learning algorithms can be used to predict the geolocation of an environment sample from its metagenomic sequencing data with reasonable accuracy.